confidence sequence
- North America > Canada > Alberta (0.14)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Best Arm Identification with LLM Judges and Limited Human
Ao, Ruicheng, Chen, Hongyu, Gao, Siyang, Li, Hanwei, Simchi-Levi, David
We study fixed-confidence best-arm identification (BAI) where a cheap but potentially biased proxy (e.g., LLM judge) is available for every sample, while an expensive ground-truth label can only be acquired selectively when using a human for auditing. Unlike classical multi-fidelity BAI, the proxy is biased (arm- and context-dependent) and ground truth is selectively observed. Consequently, standard multi-fidelity methods can mis-select the best arm, and uniform auditing, though accurate, wastes scarce resources and is inefficient. We prove that without bias correction and propensity adjustment, mis-selection probability may not vanish (even with unlimited proxy data). We then develop an estimator for the mean of each arm that combines proxy scores with inverse-propensity-weighted residuals and form anytime-valid confidence sequences for that estimator. Based on the estimator and confidence sequence, we propose an algorithm that adaptively selects and audits arms. The algorithm concentrates audits on unreliable contexts and close arms and we prove that a plug-in Neyman rule achieves near-oracle audit efficiency. Numerical experiments confirm the theoretical guarantees and demonstrate the superior empirical performance of the proposed algorithm.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences
Aolaritei, Liviu, Jordan, Michael I.
We study stopping rules for stochastic gradient descent (SGD) for convex optimization from the perspective of anytime-valid confidence sequences. Classical analyses of SGD provide convergence guarantees in expectation or at a fixed horizon, but offer no statistically valid way to assess, at an arbitrary time, how close the current iterate is to the optimum. We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD, constructed via nonnegative supermartingales and requiring no smoothness or strong convexity. This confidence sequence yields a simple stopping rule that is provably $\varepsilon$-optimal with probability at least $1-α$, with explicit bounds on the stopping time under standard stochastic approximation stepsizes. To the best of our knowledge, these are the first rigorous, time-uniform performance guarantees and finite-time $\varepsilon$-optimality certificates for projected SGD with general convex objectives, based solely on observable trajectory quantities.
- Asia > Middle East > Jordan (0.40)
- North America > United States > California > Alameda County > Berkeley (0.40)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.04)
- North America > Puerto Rico (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > Promising Solution (0.67)
- Banking & Finance (1.00)
- Health & Medicine (0.93)
Confidence sequences for sampling without replacement Ian Waudby-Smith
We present a generic approach to constructing a frequentist CS using Bayesian tools, based on the fact that the ratio of a prior to the posterior at the ground truth is a martingale. We then present Hoeffding-and empirical-Bernstein-type time-uniform CSs and fixed-time confidence intervals for sampling WoR, which improve on previous bounds in the literature and explicitly quantify the benefit of WoR sampling.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.86)
A Appendix
To quote Armitage [1993], "the classical theory of experimental design deals predominantly with A. Fisher, worked in agricultural research, where the outcome of a field trial is not available until Consider these problems first from the perspective of hypothesis testing . Perform the test too early, and the Type II error probability will be high, resulting in many small effects being undetected. For instance, consider one-sided "no-harm" testing applications where the goal is to detect (possibly To address this concern, the experimenter might wish to estimate the effect by computing a confidence interval. The experimenter cannot hope to "monitor" the effect of each arm by computing multiple fixed-n Instead of stopping the experiment at a predetermined sample size, it is more natural and useful for the stopping rule to be data-dependent in sequential applications. That is, to perform optional continuation .